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The correlated Wishart model provides the standard benchmark when analyzing time series of 
any kind. Unfortunately, the real case, which is the most relevant one in applications, poses serious 
challenges for analytical calculations. Often these challenges are due to square root singularities 
which cannot be handled using common random matrix techniques. We present a new way to tackle 
this issue. Using supersymmetry, we carry out an anlaytical study which we support by numerical 
simulations. For large but finite matrix dimensions, we show that statistical properties of the fully 
correlated real Wishart model generically approach those of a correlated real Wishart model with 
doubled matrix dimensions and doubly degenerate empirical eigenvalues. This holds for the local 
and global spectral statistics. With Monte Carlo simulations we show that this is even approximately 
true for small matrix dimensions. We explicitly investigate the fc-point correlation function as well 
as the distribution of the largest eigenvalue for which we Hnd a surprisingly compact formula in the 
doubly degenerate case. Moreover we show that on the local scale the fc-point correlation function 
exhibits the sine and the Airy kernel in the bulk and at the soft edges, respectively. We also address 
the positions and the fluctuations of the possible outliers in the data. 

PACS numbers: 05.45.Tp, 02.50.-r, 02.20.-a 


I. INTRODUCTION 

Random matrix theory was first introduced in biostatistics by Wishart [T] and later on also by Wigner in the 
context of Hamiltonian systems It has extraordinary power to model and study generic features in a variety 

of systems, see Ref. @]. It only employs basis invariance and global symmetries of the matrices resulting in the 
orthogonal, unitary and symplectic ensembles [5]. Wishart’s ideas opened a new direction in time series analysis and 
statistical inference [SIMfTS]. The Wishart model is widley used, including applications in fields such as medicine m, 
biophysics [11). chemistry [12). finance [131E], wireless communication [T3], to mention just a few. The Wishart model 
shares the unique advantage of all random matrix approaches: Most of its predictions are accessible in experiments 
or observations and can therefore directly be tested. Although the random matrix theory setup is straightforward, 
calculations are often difficult. The real case which is the most relevant one for applications is particularly cumbersome. 

We thus focus on the case oipxn rectangular matrices W with real entries Wji, G IR for j = 1,... ,p and v = 1,... ,n. 
The p rows of W may be viewed as model time series of length n. We assume a Gaussian distribution 13IZ], 

P(W|C)-exp(-^tr , (1) 

where the p x p matrix C is the empirical correlation matrix specific for the data under consideration. This matrix 
is input of the model and requires to be real symmetric with positive eigenvalues A^, i = 1,... ,p. In particular we 
have C = VAV'^ with V G 0{p) and A = diag (Ai,..., Ap). The positive definite p x p matrix WW'^ is the model 
correlation matrix and due to our choice of P{W\C), it is on average {WW'^) = C. 

In applications of the real Wishart model, correlated or not, square roots of characteristic polynomials and therefore 
branch cuts arise. For instance, gap probabilities related to the smallest and largest eigenvalue were found to possess a 
representation as averaged products of determinants in the denominator to half integer power m- Other examples are 
the eigenvalue density in the ordinary and doubly correlated Wishart model [T8ll21j . the distribution of the smallest 
eigenvalue as well as universality considerations in scattering theory [131171. Those square roots are serious 
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obstacles in analytical calculations and a solution is urgently called for. To the best of our knowledge a comprehensive 
analytical strategy for averages over a product of characteristic polynomials to half integer power does not exist. For 
certain special cases some solutions are known 11113 HZ]- 

The analytical calculations drastically simplify in the case that the empirical correlation matrix becomes doubly 
degenerate, because the square roots are not present anymore. Although this case is empirically rarely justified, our 
results provide very good approximations for the case without such degeneracies. Our main goal is a general approach 
to eigenvalue statistics in the real correlated Wishart model, which to some extent outmanoeuvers the square roots 
of characteristic polynomials such that standard random matrix techniques apply. Based on analytical calculations 
using supersymmetry |281129j and on numerical simulations we verify that most of the statistical properties in the 
bulk, at the edges and for the outliers of an arbitrary, correlated real Wishart ensemble do not depend on the degree 
of the degeneracy of the empirical correlation matrix. In particular, the spectral observables of a p x n random matrix 
W correlated with C coincide with those of a Zp x In random matrix correlated with C 0 1; where Z £ N is the degree 
of degeneracy and 1; is the Z-dimensional identity matrix. This statement becomes exact for 

0< — = 7 ^< 1 ^ n,p —>■ oo (2) 

n 


under very moderate assumptions on the empirical correlation matrix C. The eigenvalue density of correlated Wishart 
ensembles with non-degenerate spectrum was already studied by many other in [30H33j . We will regain their results 
and additionally we derive results about the local spectral statistics. 

As a by-product we also derive the sine and the Airy kernel for real matrices in the bulk and at the soft edges, 
respectively, for the fully correlated case. Importantly, we properly account for all Efetov-Wegner boundary contribu¬ 
tions |34H36j which often pose substantial difficulties in supersymmetry calculations. To this end we apply Rothstein’s 
theory m and identify the results with those for the Gaussian Orthogonal Ensemble (GOE). We include outliers and 
discuss their positions and fluctuations, provided they are well-separated from all other eigenvalues. 

We show that most of the spectral observables are independent of the degree of degeneracy, and we thus claim that 
the distributions of the largest eigenvalue for the correlated real Wishart ensembles with the empirical correlation 
matrices C and (7 G I 2 are approximately equal in the limit of large matrix dimensions p and n. We derive a 
representation of the cumulative density function in terms of a p x p Pfaffian (p even) for the 2p x 2n Wishart ensemble 
with (7 012. Eor this purpose we start from an earlier result im and employ skew-orthogonal polynomials |38j . 

Although the results are derived in an asymptotic limit, we find surprisingly good agreement with numerical 
simulations already for rather small matrix dimensions. This allows a quantitative as well as a qualitative spectral 
analysis in the Wishart model without doublely degenerate empirical eigenvalues if pjn ^ 0{1) and n,p large. 

Our study is structured as follows. In section [TTj we summarize the basics of the Zc-point function and present 
the corresponding supermatrix model. We also discuss the conditions on C to ensure that the limit n,p —>■ 00 with 
p/n £ [0,1] is well-defined. The saddle point approximation of the supermatrix model is performed in section III 


in which we also derive a simple general relation between the macroscopic level density (marginal density) and the 
saddle point solution. Furthermore, we study the bulk and the edges of the spectrum and derive the sine kernel on 
the local scale. In section IV we investigate possible outliers and the local statistics of the soft edges and derive the 
Airy kernel. We also manage to express the cumulative distribution of the largest eigenvalue in terms of a Pfaffian 
determinant. For illustrating purpose and to confirm our claims, we perform numerical simulations in section [V] We 
conclude in section VI A brief sketch of Rothstein’s theory m is relegated to appendix [A} 


II. SUPERSYMMETRIC REPRESENTATION OF THE fc-POINT CORRELATION FUNCTIONS 


The Zc-point correlation function Rk{x;^) with k <p measures the eigenvalue fluctuations of the model correlation 
matrix WW^. We use two sets of variables x = diag (xi,..., a;*,) and ^ = diag (^ 1 ,...,^fc) for later separation of 
the global and the local scales, respectively. To study the local scale, we unfold the spectrum with the level density 
which depends on the empirical eigenvalues A = diag(Ai,..., Ap). Importantly, on the original and on the unfolded 
scale, all Zc-point functions may depend non-trivially on these empirical eigenvalues. One of the main results to 
be derived below is the emergence of the universal statistical features of the uncorrelated Wishart ensemble after 
unfolding and under modest conditions on A. Furthermore, an arbitrary degeneracy of degree Z £ IN of the empirical 
eigenvalues A —>■ A0 1; does not change the statistics. Even the global level density i?i(a;) remains the same for large 
matrix dimensions n,p 00 . In sections IIA and IIB we set up the supermatrix model and test the asymptotics, 
respectively. 
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A. Setting up the supermatrix model 


To be as general as possible, we consider an ensemble of Wishart matrices W of size Ip x In drawn from the normal 
distribution 0. where the eigenvalues of C are Z-fold degenerate, i.e. the empirical eigenvalues are A® 1;. For this 
ensemble we analyze its fc-point correlation function which is expressed as the derivative of a generating function. 


Rk{x;0 


1 


LG{±1}'' i=l 


6 —f 0 
j=0 


(3) 


where Kb^i = Xb + jb + ib/(lp) + *Tb£, Hb ,2 = Xb — jb + ^b/{lp) + iLbS and Lb = ±1 for 6 = 1,... ,k. The generating 
function also depends on source variables j = diag (ji ,... ,jk). The scaling of the variables with Ip anticipates the 
local scale for spectral fluctuations inside the macroscopic bulk in which the unsealed variables Xa are assumed to 
lie. The latter variables Xa may also be degenerate, i.e. Xa = Xb for some a,b = 1,... ,p, as long as the eigenvalues 
Xa + ia/{lp) are pairwise different. The scaling of has to be adjusted when one or more of the variables Xa are at 
an edge of the spectrum. The generating function reads 


yip.’ll 


(k) 


= J d[W]P{W\C) 


det {WW'^ - Kb, 2 tip) 
det {WW^ - Kbphp) ’ 


(4) 


with d[-] being the flat measure, i.e. the product of all independent differentials. The matrix lip is the Ip dimensional 
identity matrix. 

To conveniently study the asymptotics for large n,p with p/n = 7 ^ fixed, we employ the supersymmetry method, 
see Refs. [28l [29l [36l [39l |40] . A more mathematical introduction into superanalysis can be found in Ref. [42] . Using 
the results in Refs. |2Qll22l|43|, we map the generating function Q to superspace, 

=Kni,kS'det J d[tT] sdet ^"^“^^/^crexp Raj sdet (ip 0 l 2 k\ 2 k +iA(S> a) , (5) 

where R = diag (ki,i, ..., Hk.i, ki, 2 , ■ ■ ■, Kk, 2 )'Z)l 2 is viewed as a (2A:|2fc) x (2fc|2A:) diagonal supermatrix. The factor of 2 
in the dimensions occurs because we study the real correlated Wishart ensemble. The (2A:|2fc) x (2A:|2fc) supermatrix La 
has a positive definite symmetric matrix in the boson-boson block ctbb while the fermion-fermion block ctff belongs 
to the circular symplectic ensemble [43ti46] . The boson-fermion block ctbf = {r]ab,r]ab}a='L,. .,' 2 k-b-i,...,k consists of 
2k real independent Grassmann variables and the fermion-boson block is erpB = “O’bf the dagger denoting the 
ordinary adjoint. Here we have employed the supermatrix L = diag (Li,..., Lj.) 0 Ili|i 0 II 2 encoding the signs of the 
imaginary increment e. The normalization constant 

K~i k= y d[(T]sdet'^"'“^)/^crexp ^-yStrZer^ , ( 6 ) 

is determined by the condition that —>• 1 for £ —>• 00 . By construction, we also have zjfj^\K)\j=o = 1 for 

vanishing source variables. To show the non-trivial equality of the integral ® for the normalization constant and 
the integral ([^ for j = 0 , one needs Cauchy-like integral theorems [Ml ES] lTHj4^ first derived by Wegner [35] for 
arbitrary supermatrix sizes. The measure d[cr] is the product of all differentials of the independent variables. The 
integration over Grassmann variables are normalized as 

J dr] = 0, j r]dr] = l , (7) 

which differs from another convention by a factor of -s/^. With this choice the constant Kni,k becomes in the large 
n limit 


Aoo,fc = lim Kai,k = (g) 

n—>-oo 

because the integrand can be expanded around ao = L yielding a Gaussian integral. 

In the supermatrix representation ([^ we differentiate with respect to the source variables ja and set them to zero. 
Then we perform a 1/p expansion by means of a saddle point approximation. We expand around the saddle point 
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matrix tJo according to cr = (Tq + !\fv where the scaling ^/p of the massive modes <5cr is dictated by the fact that all 

variables Xa are in the bulk of the spectrum. After keeping only the leading order term we find 


Rk{x,0 =Kni,k lim / d[cro,<5CT] 
E^OJ 


Li,...,Lfc=±l 


2 ^ 


da 

ao + — 

Vp 


k 


xn 






012 + 


7-^ - 1 L, \ 

27rz Xj + iLj£ J 



(9) 


with 7 ^ = p/n. Here is a k x k matrix with zeros everywhere and unity in the (a, 6 ) entry. For the time being, 
neither the saddle point manifold of ao, referred to as Goldstone modes, nor the support of the massive modes Sa are 
precisely specified. The second term l/{xj +iLj£) in the above product is reminiscent of the superdeterminant in front 
of the integral It generates Dirac 6 functions S{xj) which have the following origin: To derive the expression 
we used WW^ instead of W^W. Their spectra only differ in the number of the generic zero eigen values which is 
equal to n — p = ( 7 “^ — l)p for and zero for WW^. We return to these terms in subsection 

with the common terminology, we refer to the function 


IIIB 


Keeping 


C(a) =— str In (l 2 k\ 2 k + — zstr { x + leL str In cr. 

p ^ \ pi J \ pi J 


( 10 ) 


in the above expression as “Lagrangian”. 


B. Testing the limit of large matrix dimensions 


We now show that the limit p, n —)■ 00 with 0 < 7 ^ = p/n < 1 fixed is well-defined, because |sdet ^ (1 + lAia) \ is 


bounded. For the numerical part d of tr we have 


sdet ■ 


A,; 


^2k\2k 


2 An 




|l + ^A,e*‘^V(2A„,ax)|' 


n?=i|l+*^^-AzeG/{2Aa,ax)| 


( 11 ) 


where e® = Ldiag (e®b..., e® 2 '')/( 2 Aiiiax) are the eigenvalues of the boson-boson block ctbb of a and = 
diag ..., e*^'') (S> l 2 /( 2 Amax) are the eigenvalues of the fermion-fermion block ctff- Here we rescaled a —>■ 
CT/(2Aniax) with Amax being the largest of the empirical eigenvalues A. The expression (111 is bounded from below 
and above according to 


0 < 


(1 - A,/(2A„,ax)) 


2k 


n,"i(l + e 2 GA,/( 2 A„,ax)) 


< 


sdet ■ 


A,, 


^2k\2k ■ 


2K 


< 1 + 


A,, 


2 A„ 


2k 


< 00 . 


( 12 ) 


These bounds are integrable due to the terms exp[—nlee^^ /2] and and due to A: < p < n in the integrand. 

This estimate only holds for the part of nr=i |sdet (l 2 fc| 2 fc + *Aicr) | without the Grassmann variables. An expansion 
in the Grassmann variables yields a finite polynomial in powers of the matrices 


1 ^ 

n 


^ (2Amax/Ail2fc +^CrBB) 


(2Aniax/ ^i'^2k H“ 


0 m 


(13) 


which are contracted in the generating function ([5 1, the details do not matter. The tensor product multiplies the 
space corresponding to the 2k x 2k boson-boson block with the one corresponding to the 2k x 2k fermion-fermion 
block. The exponent m = 0,..., 2/c^ is taken in a tensor sense, too. The modulus of the spectrum of these matrices 
are bounded from above by 2“^"* independent of A and a. Therefore the limit p, n —> 00 with 0 < 7 ^ = p/n < 1 fixed 
is well-defined if we assume that 


1 -A ' 

lim - ln(l + sKi) < 

p—>oo p 

^ i—1 


00 


(14) 


remains finite for any s > — l/Amax and in the case that A„iax/Ai also remains finite. This is realized when the 
smallest eigenvalues is of the same order as the largest eigenvalue Amax- 
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If A contains a finite number pout of outliers of larger order as the ones in the bulk, we may still resort to the 
discussion above. We split the product of superdeterminants in two parts, 


P P-Paut p 

sdet (l 2 fc| 2 fe + jAicr) = sdet {t 2 k\ 2 k + sdet (li 2 fe| 2 fc + *A,cr) . 

i=l i=p—po„t + l 


(15) 


Only the first product enters the saddle point equation to be given in the sequel while the second one may be 
considered as a p-independent perturbation of the integrand. The second product cannot contribute to the saddle 
point analysis since the number of outliers Pout is assumed to be fixed. The physical interpretation is that outliers 
which are macroscopically separated from or may even lie on a scale larger than that of the bulk do not influence the 
statistics in the bulk. We study the outliers in more detail in subsection |IV A 

Another remark is in order, clarifying how the existence of a limiting distribution p(A) for the empirical eigenvalues 
A affects the above discussion. Such a distribution exists if 


p °° 

lim iV/(A,)= f f{X)p{X)dX, 

p—>-oo p ^ J 

i—1 n 


(16) 


whenever the test function / is integrable with respect to p and f{Ai) < oo. The sum in the Lagrangian (10) is then 
bounded from above and below by the average of the integral of the supermatrix resolvent, 


V ^ 

1 T f 

^li^ P T! (l2fc|2fc + iAicr) = J str In {l2k\2k + *Act) p(A)dA. 


(17) 


Outliers appear as Dirac <5 functions in p. Although Eqs. (161 and ([T7| are only valid if a limiting distribution for 


the empirical eigenvalues A exists, we want to find an expression which still provides a good approximation at finite 
matrix dimensions p and n. 


III. BULK STATISTICS 


We analyze the bulk statistics in three steps. First we discuss the saddle point approximation for a general fc-point 
correlation function in section III A In section III B an explicit and very simple relation between the macroscopic 
level density and the saddle point solution is presented. In section III C we show that inside the bulk the whole 


spectral statistics on the local scale agrees with the sine kernel of real matrices. This is true and exact for all fc-point 
correlation functions including the cumbersome Efetov-Wegner boundary terms, see Refs. [sniiii]. 


A. Saddle point approximation in the bulk 


We now show that, assuming the condition (141, the fc-point correlation function is independent of the degree I 
of degeneracy in leading order of a 1/p expansion. To this end, we carry out a saddle point approximation of the 
integral ^ by expanding the Lagrangian (10) up to the order 1/p, 


C 


V + X! (,^2k\2k + lAiffo) - istr (^x + iei}j cto - ^1 - ^ j str In 




i=l 

I 


I str 


„2 P 


A,; 


— > ‘ — X — leL + zctq 

P ^ ^2k\2k + lAiUo 


1 

—str 

2p 


1=1 
2 P 

-E 

p h 


Sa 


A,; 


l2fe|2fc + *AitTo 


6a + (o-Q ^6a)^ - ^y|cro 


The term of first order in Sa yields the supermatrix valued saddle point equation 

"" — A^Q 


xQ + l2fc|2fc-E/ 


P i=1 ^2k\2k+AiQ 


= 0 


o 


,3/2 


(18) 


(19) 
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FIG. 1: (Color online) Asymptotic schematic behaviour of the rational function g{q, x) (solid curves) at the singularities (dashed) 
and at infinity, c/. Eq. | |20[ ). The variable x stands for any eigenvalue x\,... ,Xk of the correlated Wishart matrix WW^. The 
qualitative convexity properties for q > — 1/Ap are also shown. However the behaviour for q < —1/Ap strongly depends on the 
parameter 7 ^ = p/n and the empirical eigenvalues Aa. Concretely, we only have a maximum above the horizontal green line 
for q < — 1/Ai when 7 ^ < 1 (left figure) and no maximum at all for 7 ^ = 1 (right figure). In the latter case g approaches the 
value —1 for q —>■ —00 from below instead from above. Moreover we may have maximally one maximum and one minimum 
inside any of the intervals ] — 1/Aj, —l/Aj+i) or none at all, depending on the distance between the individual Aa. 


where Q denotes the solution. We neglect the term in e as it is infinitesimal. The difficulty is that the saddle point 
solution Q = la depends in a highly non-trivial way on the empirical eigenvalues A^. 


1-1 


The saddle point equation is essentially scalar, as may be seen by taking the commutat or o f Eq. (19) with Q 
We obtain x = Q~^xQ implying that Q and x commute. Thus, we can analyze Eq. (191 in the space of the 
eigenvalues of Q. There are two kinds of eigenvalues, namely = diag ..., q^fc^) in the boson-boson block 
and = diag (gi^\ ..., q^'^) C) II 2 in the fermion-fermion block. The double degeneracy of the latter is the Kramers 
degeneracy for quaternion matrices. The integration domain is non-compact for q^'°\ q^^^ € iLjR^ with Lj = ±1, 
and compact for q^^\ € U(l). Hence we only need to analyze the scalar saddle point equation 


^ q{xa) ^ 1 


A, 


^iqi^Xa) 


= -Xa + 9{q{Xa)) = g{q{Xa), Xa) 


( 20 ) 


where we introduce the functions q{xa), g{q{xa)) = g[q{xa), 0) and g{q{xa), Xa)- Anticipating the following discussion, 
we mention that the level density is directly related to the solutions of this equation. It is a classical result in high 
dimensional inference [30U^ where it was derived by other means. Marcenko and Pastur [30] showed that, if this 
equation has a solution in the upper half-plane, this solution is unique, which we denote by qoixa)- Although Marcenko 
and Pastur already discussed the following setting in more generality, namely for an arbitrary density p in Eq. (16 1 , 


we want to analyze the rational function g{q) at finite matrix dimension p, in particular its singularities. We need 
results of this discussion for the analysis of the spectral statistics on the level of the local level spacing. 

The equation g{q{xa),Xa) = 0 has p + 1 roots for each Xa- Moreover the function g{q) is singular at g = —l/A^ for 
I = 1,... ,p and at g = 0. An asymptotic analysis of the singularities yields 


lirn 5(9) = TOO , 


lim gig) = ±00 , 


where ± indicates the limit from above or below, respectively. For g —>■ ±00 we obtain 


g{q{Xa),Xa) -Xa - 


1-72 


^ pAi g2 


( 21 ) 


( 22 ) 


with 7 ^ = p/n < 1. Figure shows the asymptotic behaviour of g{q,x). As there is at least one real root of g{q) 
within each interval (—l/A^+i, —l/A^) for I = 1,... ,p — 1, at least p — 1 out of p T 1 roots are real. Since Eq. (20) 
is real, the complex conjugate qo{xa) of a solution qo{xa) solves Eq. (20) as well. Hence, the remaining two roots are 
either a complex conjugate pair or both real. 

When a complex conjugate pair solves the saddle point equation, the eigenvalues g^*’^ in the boson-boson block of 
the supermatrix Q can only reach those solutions which share the same sign of the imaginary part with Xa + iLaS. 
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This is due to the infinitely high potential walls around the singularities q = — l/A^ when p —> oo. In contrast, the 
eigenvalues gj in the fermion-fermion block reach both saddle points. When diagonalizing the supermatrix Q we 
obtain the Berezinian, i.e., the superspace Jacobian, 




|A2fe(g(‘’))|3A6(g(f)) 

A 2 ,(g(b);g(t)) 


(23) 


with the Vandermonde determinant Plugging the two kinds of saddle points into this 

Berezinian, we observe the following. Solutions in which the eigenvalues of the boson-boson block and the fermion- 
fermion block do not agree are algebraically suppressed by factors of 1/p and thus smaller than those in which the 
spectra of the boson-boson and the fermion-fermion blocks counted with multiplicities coincide. 

In the case that all solutions are real we may reach more than one saddle point with the boson-boson block of Q. 
However only one of all real saddle points contributes, as can be seen by considering the first and second derivative 
of g (second and third derivative of the Lagrangian @) which read 


g'iq) 


„2 P 


A? 


T_ i_ 

p “ (1 + Aig)2 ’ 


9"iq) 


3 ,37^ Af 

g^ P ^(I + Aig)3' 


(24) 


For g G] — 1/Ap,0[ the estimate 

g"iq) > 37 'A3 - A > 3(^2 ^ > 0 (25) 

holds, because of (g + I/Aj)“3 > for all j = l,...,p. Hence, g is concave in this interval, cf. Fig. We find a 
similar estimate for g > 0 , namely 


<?"(?) < 3^^^ < 0 (26) 

implying that g is convex on the positive real line. We recall that 7 ^ < 1 because of p < n. 

When g is between two empirical eigenvalues, in particular g G] — I/Aj, —l/A_;+i[ with g = 1,... ,p — I, we find 
either not an extremum or a single pair of a minimum and a maximum. This is so because of the following reason. 
The curvature of q' is the third derivative of g. 


g^^Hg) 1 7^y- 

9 ^ P“(l + Aig)4 


6 


< 


-V 

n ^^ 


2 P 


2 P 


Af 


. . - — V 

P V P^(1 + Az9)'‘ 

2 

1 




hi 


<max< 0 , p^(i_^Aig )4 

< 0 


(27) 


for all g satisfying g'{p) < 0. Hence g'(p) is convex in these regimes and it may have either no or two zero points 
(counted with multiplicities) in the intervals ] —1/Aj, —l/A_,+i[ implying the extrema for g(g). In the second, third and 
fourth line of Eq. (27) we employed the assumption g'ip) < 0, the fact that the right hand side is a concave function 
in 7 ^ G [0,1] and that the second term in the maximization is the negative variance of the sequence A^/(l + A^g)^, 
j = I,... ,p. The estimate (27) also tells us that for g < — 1/Ai the function has either a single maximum or none at 
all depending on whether 7 ^ < 1 or 7 ^ = 1 , respectively. 

Altogether, we know that the function g has at each real value x either p — 1 or p + I real zeros counted with 
multiplicities. Only those solutions qo{x) where g{qoix)) has a positive slope correspond to a minimum of the La¬ 
grangian (10) in the eigenvalues of the boson-boson block of —Qx~^ = —Lax~^. The asymptotic behaviour and the 


convexity properties of the rational function g imply that in the case of p -|- 1 real solutions only one of those solutions 
has a positive slope at g{qo{x)). Consequently, the non-compact integrals are evaluated at this saddle point only 
regardless what sign La is chosen. A similar argument holds for the compact integrals over gf in the fermion-fermion 
block which sees exactly the same point as a minimum as the eigenvalues gf and all other real solutions appear as 















maxima in the Lagrangian ( |Io| ). Despite the fact that the contours of and orthogonally cross each other at 
the saddle point qoix)., the opposite sign in the supertrace renders the saddle point for both contours a minimum. 
Summing over L in Eq. we notice that those terms where the contributing saddle point solution is real vanish 
because the contributing saddle point is independent of the corresponding sign. Therefore only the complex solutions 
contribute. We thus omit all real roots in the following. 

Since Q and x commute, we may choose an appropriate block diagonal basis, Q = diag ..., where a < k 
is the number of distinct points Xa, and discuss the resulting saddle point equation for each block separately. The size 
of a single block depends on the degeneracy rrio of the point x^°'^ in question, o = 1,..., a, ie. if xq = Xq = • • • = Xi^ 
the corresponding block is of dimension (2mo|2mo) in superspace. By <8) liq ® I 2 we denote the projection 

of L onto the block corresponding to the point x^°\ The resulting saddle point equation is invariant with respect 
to UOSp(L^^^) X • • • X UOSp(L(“)), where UOSp(L^°^) is the group of pseudo-unitary orthosymplectic matrices T 
with the property Tdiag(L(°) (g) t 2 \'^ 2 mo)T^ = diag{L^°^ g) Hence, instead of isolated saddle points we 

obtain saddle point manifolds, see Refs. [551 UHl ESI ESI lU in another context. From the discussion above, we have 
to integrate Q = lao = diag ■ ■ ■, with 

=Re qo{x^°^)lm, + iT^°hm qo (^x^°'> + 

«Re + ilm qo{x^°^) (28) 


over the coset 


G UOSp(L(°))/[UOSp(2to[,°^|2to^°^) X UOSp(2m^°)|2m^°))] 


(29) 


which parametrizes the “Goldstone modes”. The variables 2mg°^ and 2mJ°^ are the numbers of +l’s and —I’s in 
such that mg°^ + m^i'^ = mo- 

We now turn to the integration over the “massive modes”, parametrized by 


6a = T 


5aii ■■ 

• Saia 




T- 


(30) 


where T = diag (T^^\ ..., T^“i) and where Saab is a (2ma|2TOa) x {2mb\2mb) supermatrix. The diagonal blocks satisfy 
the commutation relations [(JcTooj = 0 since the remaining integration, in particular the components which do not 
commute, is accounted for by the integrals over T^°'>. The challenging part in specifying the whole symmetries of the 
blocks Saab are the phases in front. The quadratic part in Sa of the Lagrangian (18) has to be positive definite and 
must ensure convergence. We define the complex numbers 


g(+) = 2! V ^ 

ab „ A-l -1- 


and 


P ^ A* ^ + go(a;(“)) ^ + qo{x(’>'>) qo{x(‘^'^)qo{x(^)) 

A-) = 1 


1 


1 


-ab p ^ 1 go(a;(“)) A^ 1 + {qoixd’'^))* go(a;(“))(go(a;('')))* 

allowing us to split the matrix blocks as follows 

Saaa = i — Sai°°^ 


1 


f. a<b 

— 


a>b 

^^ab — 


ziV 

1 


d+) 

'ab 


-.5(7, 


( 00 ) 

ab 


UV) 

1 




' zr'+ 


1 


-Sa 


( 11 ) 




d-) 


ab 


1 


.(+) 

'ab 




-.5(7, 


(ii)t 

ab 


+ 


1 r (01)f 

^ab 


i^ib)* 

1 


-Sa 


( 10 ) 
ab ’ 


\fiAJ~y 


-.5a 


(10)t 


ab 


(31) 

(32) 

(33) 

(34) 

(35) 


such that Ld') Sa^il'’ = (—^ Blocks of the form Sa^^^ and Sa^a^ do not exist, because of the required 
commutation relation with We recall the dimensions and which essentially are the signature of 
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The diagonal matrix blocks 5aaa'^ are Hermitian supermatrices of dimension (2777.^“^x {2trS- 


\2rn'^') where 

the boson-boson blocks are real symmetric and the fermion-fermion blocks are Hermitian self-dual. The off-diagonal 


block Sa^^p has dimension (2m^“'’|2m^“'') x (2m^“''|2 to^“’’). Its boson-boson block is an arbitrary real matrix and its 
fermion-fermion block an arbitrary quaternion matrix. The integration measure of T is the Haar measure on the coset 
and the one of da is the flat Lebesgue measure for the commuting and the Berezin measure for the anticommuting 
variables. The Lagrangian (18 1 takes the form 








^ l<a<b<aij==0,l ^ a=l j=0,l 


(36) 


where ^ = diag 


, 1^“^) is splitted analogously to L. The prefactors of the individual blocks 5a^^l\ see Eqs. (|33 


35), cancel in the Berezinian of the change of coordinates for the supermatrix a into the Goldstone and the massive 
modes because we have for each of these blocks the same number of real variables and Grassmann variables. 

To proceed we have to carefully analyze the Efetov-Wegner boundary terms |35L 136] . They are an inherent feature 
of superanalysis without counterpart in ordinary analysis. These terms appear whenever a change of variables is 
performed on superspaces with boundaries, including those boundaries induced by coordinate singularities of the 
Berezinian. 


B. Macroscopic Level Density 


In the case of the macroscopic level density, z.e., fc = 1, = 0 and Efetov-Wegner boundary terms 

cannot appear because we only shift and rescale the supermatrix a —> 5a. The Gaussian integral over 5a cancels the 
constant lim„_>oo K^i^i = l/{ 87 r^) in the limit n —>■ oo. The level density becomes 


Ri{x) « lim Im 
e-s-O 


7^7r 


qo{x + le) -I- 


^ - I 1 

TT X + l£ 


7^7r 


Im qo{x), 


(37) 


for all values of 1. Hence, the saddle point solution qo{x) is up to the normalization 1 / 7 ^ the Green function — also 
known as Gauchy or Stieltjes transform — of the density i?i(a;). When writing Im go (a;) shorthand, we view the 
1/x singularity of qo{x) at the origin as a real term which may be neglected. Thus, the chain of equalities (37) is 
consistent. 

The coincidence of qo{x)/^^ with the Green function implies that the function g{q) — 1/q, see Eq. (20), can be 


identified with the R transform in the theory of free probability. An introduction to free probability in random matrix 
theory can be found in Ref. [S2 [5S]. 


The Dirac <5 function or equivalently the second term under the limit in Eq. (37) is important for 7 ^ < I when the 
limit e —>■ 0 is still to be taken. To clarify this we consider the asymptotics of the saddle point solution for x + ie ^ Q 
which is equivalent to g —)■ — 00 , c/. Fig. We employ the asymptotics (22) of the function g. Taking into account 
the first two terms only we find the asymptotic behaviour of the saddle point solution as qo{x -I- le) « ( 7 ^ — l)/(a; -I- ie) 
for |a; -|- 7 e| <C I. The imaginary part of this term yields in the limit e —> 0 the Dirac 5 function at the origin which 
we subtract. 


To study the edges of the spectral support we again start from the saddle point equation (20). Multiplying this 
equation with qo(x) and taking the imaginary part for a; > 0 we find 


2 P 


Ajlmqo(x) 


-lmqo(x) ^ ^ ( 1 AjRe( 7 o(a;))^ + A 2 (Imgo(a;))^’ 

j — l J 


A similar equation can be derived for the real part. 


2 P 


1 -I- AjRego(a^) 


'Rego(a:) 7 1 p ^ (1 + AjRe(7o(a;))^ + A2(Imgo(a:))^ ’ 

j — l J 


The latter equation can be rewritten to 


7^ - 1 - V[(l + AjReqojx))^ + A^(Imgo(a:))^] 

^ a; + 7VpEi=i^i/[(l + AjRego(a;))^ + A2(Imgo(a;))^] ^ 


(38) 


(39) 


( 40 ) 
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which is obviously always negative because 7 ^ = p/n < 1. Hence the sum 1 + Aj^Re qo{x) might vanish for a particular 
Ajp such that we have to be careful. However this scenario does not happen at an edge where either Im go {x) —> 0 or 


Imgo(a;) 

reads 


00 due to the following reason. Suppose Rego(a;) = degeneracy /q, Eq- 

AjImgo(a;) 


:Imgo{a;) = — V] 

n < ^ 


+ 




1 




(l-Aj/AjJ 2 +A 2 (Imgo(a :))2 p A^-Jm^oCa;) 


then 


(41) 


which never satisfies one of the two solutions Im( 7 o(a;) = 0, 00 . Thus we only have l+A^-Re qo{x) ^ 0 for all j = 1,... ,p 
at an edge. 

For Eq. (38), there are only two types of solutions. Either we are at the origin, then Im( 7 o(x) has to diverge, 
according to Im( 7 o(a;) = c/y^ with c~^ = 'y^lpY^^=\ ^o satisfy Eq. (38), or the edge is not at the origin, then 
we can expand Eq. (38) for small Im( 7 o(a:) which yields the square root behavior 

. . J ^^x — Xedge, Xedge IS a lower bound of a cut, 

' ( -y/xedge — X, Xedge IS an Upper bound of a cut. ^ ' 

The largest eigenvalue lies at the edge 


= a 


Q{g'{q)W 


(43) 


-A„ 


with the Heaviside function 0. This result follows from the saddle point equation (20) and from the monotonic 
behavior of ^'(x), cf. Fig. Accordingly, the smallest eigenvalue lies at the edge 


^min — Q I 


5 - y 0(-ff'('?))dg-A 


-1 

1 


(44) 


When we have more than only one cut in the spectrum, we find upper bounds at 


T-d) — 
< 1 ^,, — 


/ e(5'(g))dd -2 / q&[ 9 '{q))dq 


i-A7 


-A 




2 / 0(5'(9))dQ 


-A 


3-1 


and lower bounds at 


Ji) - 

X\ — , 


/ e(5'(g))dg 


i-A" 


-A^-^ \ 

2 / qQ{g'{q))dq 

-A 7-1 


-A^-^ 

2 / 0(5'(9))dQ 


V 


-A 




/ 


(45) 


(46) 


in the interval ]Aj_i, A^[ with j = 2,... ,p. Edges are not found in ]Aj_i, Aj[ when x{f^ = xp\ in particular as g'{q) 
is strictly negative in ] — — A“^[. It might happen that two cuts start merging such that the latter scenario 

occurs. Then one has to take into account the second derivative g"{q) and the level density behaves as (Xedge — x)^/^ 
where we expect Pearcy kernel [Ml ES] behavior on the local scale. We do not show this in the present work. 

The situation slightly changes when considering the exact limit n,p —>■ 00 and 7 ^ = pjn fixed where we have to 
assume a limiting density p(A) for the empirical eigenvalues A, c/. Eq. (16). As long as —Rego(a^) is in the support of 
the empirical density p(X) where this density is finite we can carry out the same analysis as above because the saddle 
point solution has to satisfy the counterpart of Eq. (38) which is 


00 

xIm(7o(x) = 7^ y 


Ap(A)dA 


(1 + ARego(a;))^ + A^(Imgo(a;))' 


rim go(x). 


(47) 


0 
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As the integrand is non-integrable for Imqo(3^) = 0 we conclude that Im( 7 o(a;) has to be finite. This argument also 
applies when —Re ( 70 ( 2 :) is at an edge of p{X) where the density either diverges (this divergence has to be integrable 
and to satisfy assumption (HI) or remains finite. Here we exclude the origin where the behavior is different. 

When —Rego(a^) is taken at an edge where p(A) vanishes it may happen that Imgo(a^) vanishes, too, which is, 
however, very unlikely. In particular we would expect this scenario only when cuts may start to merge implying that 
the edge is located in the bulk of the spectrum. The generic case is that Im( 7 o(a^) vanishes when —Rego(a^) is outside 
of the support of the empirical density p(A). Hence, if this is the case and we are at a soft edge, i.e. Im(7o(a^) 0, 

we may expand Eq. ( |47[ ) for small Im(7o(3^) and find the square root behavior (42). 

A hard edge (with Imqo(a;) —>■ 00 ) of the macroscopic level density (37) only appears at the origin x = 0. This 
follows from Eq. (47) when 7 ^ = p/n —)■ 1. We find the standard l/y^xhehavior in the case that p(A) is separated 
by a finite gap from the origin. The situation drastically changes when the support of p(A) touches the origin. For 
example for p(A) = 0(1 — A) we find a singular behavior with -^/Inx/x. The condition for encountering the standard 
singularity l/>/x is the existence of the integral p{X)dX/X. 


We restrict ourselves to a detailed discussion of the soft edges having the form (42) in section IV 
scale, we will find the Airy statistics as for the uncorrelated Wishart ensemble. 


On the local 


C. Correlation Functions 


We turn to the fc-point correlations for arbitrary fc S N. We may assume that x(°^ >0, o = 1,..., a and that these 
points do not lie at a boundary of the support of the spectral density ( [^ . We thus omit the Dirac S contributions at 
the origin, in particular the terms l/(xj + iLjs) in Eq. ([^. We integrate over the non-diagonal supermatrix blocks 
ScTab {a ^ b) which yields a constant equal to (27r^)^™“'"'’ for the block Saab- We recall that Efetov-Wegner terms do 
not occur since we only rescale those blocks. 

The remaining integrations produce the well-known spectral statistics built upon the sine kernel for real eigenvalues. 
To show this, we recall — in an appropriate formulation — the integral representation of the fc-point correlation 
functions on the local scale of a Gaussian Orthogonal Ensemble (GOE) of nl x nl real symmetric matrices H, see [44] 


£->■0 Lg {± 1 }'' 

j-J-O 


Li „ / d[i7] exp(—nZ tr i7^)sdet 0 l 2 fe| 2 fc ~ In; G ( 7 r^/(nZ) + j + leL)) 


= JiSo E / dW exp ( - ) n ^®tr 

lg{±i} 


/ d[i7] exp[—nZtr 
0 


r gfc. 


i=l 


0 

e ^33 


19 


(48) 


where cr is a supermatrix with the same symmetries as in Eq. ([^ . The Lagrangian is given by 


£(cr) = -strcT^ — zstr 


leL + ^ 1 ^ ~ f 1 ^ 1 str In cr. 


(49) 


The scaling of the local fluctuations 7 r^/(nZ) is motivated by the local GOE level spacing at the origin. The (2fc|2fc) x 
(2fc|2A:) supermatrix a is integrated over the same domain as in Eq. ([^. This /c-point correlation function (48) 
contains the real sine kernel as can also be derived by other methods such as skew-orthogonal polynomials, e.^., see 
Refs. |5M55] . 

The saddle point equation in the limit n —>■ 00 of Eq. ( |4^ is simply a — a~^. After an analysis similar to the one 
in subsection HI A| we find the saddle point manifold a = T{L + Sal^/n)T ^ with T G UOSp(L)/[UOSp(2A:o|2fco) x 
UOSp(2fci|2fci)J where Sa is any Saaa in Eq. (33) with replaced by k. The integers fco and ki are the numbers 


of +l’s and —I’s of the Lds, respectively. The Lagrangian (49) becomes 


c (tlt-' + ^ 


2n \ nl 


I TLT 


1-1 


-O 


\ n' 


(3/2 


(50) 


which we compare with the approximation (36) of the Lagrangian for the correlated Wishart model. Thus, the 
identification 


= Ri(x(°))Ci°) ^ d^,°) = i?i(x(°))de(°) 


(51) 
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with k —>■ must yield the same approximation. Indeed, Eq. (511 is the unfolding prescription to uncover the local 

spectral fluctuations at the position > 0. 


To further solidify our line of reasoning, we now show that the remaining parts of the integrand (52) agree with 
this unfolding. Abbreviating the Efetov-Wegner boundary terms with “b.t.”, we have 


lim Rkix,^)d[^] = 
1 — 


p—>-oo 




K M 

oo,mQ oo,m^ 


d[^(°)]lim ^ /d/r(T(°)) 


r{°) r.{°) 


L)^'=±l 


^ L)°hiLnqo{x^°'>) / \ \ t -i-i 

i=i 


STry^ 


e™° 0 

0 -e”: 


(52) 


X exp ^ - estiT^°'> 


“hb-t. 




The ratio of the constants AToo.mo? see Eq. (|^, in front of the flat measure d[^(°^] results from the original constant 
Kni,k and from the integration over Sa. The real parts of qo{x^°^) drop out because the corresponding integrands are 
symmetric under the transformation ^ VT^°> where V embeds the supergroup UOSp(2|2). This embedding in 

the form of a (2|2) x (2|2) supermatrix corresponds to the diagonal matrix diag (e^°) e^°; “6™°) which breaks 

this symmetry for the imaginary parts of qo(x^°^). Adjusting Cauchy-like integration theorems d la Wegner |34b 
to our case of UOSp(2|2) we find that the corresponding blocks of vanish such that the sign Lj drops 
out in the integrand, including the Lagrangian, and the sum over Lj cancels this contribution. Hence the integral 
only depends on the imaginary part of the saddle point. 

The measure is the Haar measure on the coset UOSp(L^°^)/[UOSp(2mg°^|2mQ°^) x UOSp(2mj°^|2 to^°^)]. 

Its normalization is induced by the flat measure djcr] from which we started. The s term in the exponential function 
still guarantees absolute convergence of the integral because we may have non-compact group integrals comprised in 
T^°\ We absorbed the prefactor in this latter term because we take the limit £ —>■ 0. The integral over the remaining 
massive modes da also yields a constant equal to unity as the numbers of ordinary variables and Grassmann variables 
are the same. 

What is the contribution of the Efetov-Wegner boundary terms in Eq. ([5^? — We apply Rothstein’s theory 133 to 
make changes of variables in superspace. Its main result is that Efetov-Wegner terms can be associated with certain 
vector fields, here denoted Yo, see appendix]^ For the fc-point correlation function ( [5^ , we change the integration 
variables according to (7*'°°^ —)■ T^°\Saaa/^/p—^qo{x^°'^ Here, a^°°^ is the (2m*^°^ |2m^°^) x (2 to^°^ |2to^°^) 

supermatrix block of a which is at the same position in matrix space as {6aaa/— ^qo{x^°^ + isL^°l))T^°'^ 
Then Eq. (52) becomes 


lim Rk{x;^) = TT 


p—>-oo 


Koo,ma lim 
£—^0 


E 


eM-Yo{T^°\Saoo)]dp{T‘^°^)d[5ao 




X 


L^°hmgo(a;l°l) / \ ~( \ t i-i e 


i=i 
X exp 


STry^ 

[igoC 

V 2y2 


■33 0 

0 


(53) 


/ zimgo(xl°l) |(o)y(o)j^(o)y(o) 1 _ £strTl°lr(°l^ - -^strfo^^ 

I o_.9 > ^^2 tjc 


where all Efetov-Wegner boundary terms are taken care of by the vector fields Yo(T^°\5aoo)- Unfortunately, explicit 
expressions for those vector fields are not available in general. Only for the case of Hermitian supermatrices a successful 
explicit identification of all Efetov-Wegner boundary terms was achieved in Ref. m at small matrix dimension and 
in Ref. [53 for general supermatrix size. The order of the action of the operators exp[—and the 
measure dfj,{T^°^)d[Saoo] is important since dp,(Tl°l) also incorporates non-trivial ingredients, see appendix |A[ Hence, 
YoiT^°\ daoo) does not only act on the integrand but on this measure, too. 


We now can exactly identify the product of integrals (53) with the fc-point correlation function (48). The vector 


fields Yo{T^°\ Saoo) do fully coincide with those for the correlated Wishart ensemble because we perform the same 
change of integration variables. The integrands are also equal in the large p-limit, apart from the rescaling of the 
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spectral fluctuations (unfolding), see Eq. (511. We infer the important result that both correlation functions, including 
all Efetov-Wegner boundary terms, are exactly the same. The second equality of Eq. (52) reflects the universality of 
the local spectral fluctuations. 

A last remark is in order. The factorization of Rk{x) into the -point correlation functions X^{o) ..., ^(o)) 

does not come as a surprise since we zoom into the spectrum at different points ... Those points are 

macroscopically separated such that eigenvalues around should be statistically independent from those around 
another point x^^^. This is so because the other infinitely many eigenvalues in between cause a screening. The next 
to leading order in the 1/p expansion, however, must crucially depend on the random matrix model, especially the 
confining potential, e.ff., see Ref. [5U] . 


IV. OUTLIERS AND SOFT EDGES 

In subsection |IV A| we investigate the limiting positions and the fluctuations of possibly existing outliers. In 
subsection |IVB| we derive the exact real Airy kernel statistics at any soft edge of the bulk. In subsection |IV C| we 
trace back the calculation of the cumulative density function to skew-orthogonal polynomial problem. 


A. Outliers 


An outlier is an eigenvalue that is separated from all other eigenvalues. It thus suffices to investigate the level 
density (37) because the higher correlations involving outliers are suppressed. We may neglect the outliers in the 


saddle point equation (20) for the bulk of the eigenvalue density because they are 1/p corrections, but we have to 
study their average position and the width of their distribution. The peaks in their distribution result from the fact 
that the saddle point solution go(x) cannot stay on the real line in the vicinity of the poles —1/Aj. The solution go(x) 
has to leave the real line when tuning x, even though this is only necessary for a very small interval in x. 

We consider the outlier Aq, say. To analyze the behavior of the saddle point solution in the presence of Aq, we 
expand the eigenvalue variable x = xo-|-fe/Y^ and the saddle point go(a^) = — l/Ao-|-5go/v^ Eq. (20). The scaling 
1/\/V for tfio deviations, bx and dgoj will turn out to be the correct one later on. The variable bx probes the level 
density around the point xq and, thus, plays the same role as (/ in Eq. ([^. The point xq is the position of the outlier 
peak for p —)■ oo, while its corresponding point of the saddle point solution is the pole go(xo) = — I/A^. To express Xq 
and bgo as functions of bx, we expand the saddle point equation ([20| up to order 1/y/p, 



Aik] 





(54) 


This expansion is not valid for the eigenvalues inside a bulk of eigenvalues since then the difference Aq — Aj might 
be less than order one for some j ^ o, implying higher order terms in p in the expansion ( |54[ ) . We now see why the 
above variations around a;o and go = ~l/Ao were chosen of order 1/y/p because other dependencies would lead to 
inconsistent expansions. Identifying the the terms of order one and 1/y/p yields two results, namley 


for the limiting position and 


bgo{bx) = I Ao - 


Xo = I 


A^A2 




/ 


p Ajf 


bx ^ 

Y ^ 




\ 




A^A^ 

1^0^^J 


P ^ (Ao - AjY 


(55) 


(56) 


for the deviation of the saddle point solution from the pole go = — I/Aq. Interestingly, the position of the outlier is 
not directly at Aq but slightly shifted, c/., Eq. (55). Only in the limit p ^ I and Aq ^ Aj for all Aj in the bulk of 
the eigenvalues, we have Xq = Aq. 


The fluctuations of the outlier around the position xq are of the order 

Axo « 2j 




Ao 


1 -— 

\ P (Ao-Aj)2^ 


(57) 
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as can be read off from the relation between the level density (371 and the saddle point qo. The saddle point only 


yields a contribution to the spectral density if it has a non-vanishing imaginary part which, in turn, can only result 
from the square root in Eq. (56). This implies a condition on the empirical eigenvalues for the expansion (54) to hold, 


-y 




< 1. 


(58) 


This condition can occasionaly be violated for some time series as we show in our numerical simulations in sectio n [V| 
In such cases the expansion (54) fails because the matrix dimensions are too small. We expect that the condition (58) 


is always true for sufficiently large p and n and for an outlier Aq that is larger than the soft edge of the bulk. This is 
consistent with the 1/p suppression of the contribution due to other outliers in the sum (58). If p,n are too small, the 
condition ( |5^ fails whenever (Aq — A^)^ < A^/n with A^ being another outlier. The difference (Aq — A^) then has 
an order behavior, resulting in a higher order polynomial equation for the saddle points. Another problem arises 
when the outlier is too close to a soft edge of a bulk of eigenvalues. Such a situation can emerge when the noise in 
the data becomes too strong and the outliers start to merge with the bulk. Again, the saddle point equation has to 
be modified. The worst scenario is when both situations occur simultaneously. 

Another point deserves further discussion. The square root behavior of the level density, also known from Wigner’s 
semi-circle law, cannot be interpreted as the limiting distribution of the outlier. A simple argument from the full 
random matrix model 0 shows that, for large p and fixed degeneracy I, the distribution for the outlier around Aq 
coincides with the level density of the I x I Gaussian Orthogonal Ensemble (GOE) centered at Xq and with fluctuations 
proportional to Accq. The shape of the outlier level density encodes the number of eigenvalues, he., the degeneracy I, 
Fig.H Nonetheless the position and the widths of the distributions of the outliers are still the same. Only in the limit 
Z —>■ oo of infinite degeneracy we find Wigner’s semi-circle law again. The reason for this behavior is the macroscopic 
distance of the bulk and the outlier. The distributions of the individual eigenvalues of the outlier (indeed we have 
more than one because of the degeneracy 1) with those in the bulk will only have an exponentially small overlap such 
that one can consider the eigenvalues in the outlier separately. 

Interestingly, the results ( [5^ and ( [5^ seem also applicable to outliers which lie on a scale different from that of the 
bulk. The limiting position Xq as well as the order Axq of fluctuations scale with Aq. They are thus likely to become 
Xq = Ao and Axq « SyAo/^/p for large p. 


B. Airy Statistics at the Soft Edges 


We restrict ourselves to the soft edges where the asymptotic level density (37) vanishes as a square root and derive 


the Airy kernel. We do not consider higher order multi-critical points which cannot be excluded a priori. For the 
sake of simplicity we assume that Xj = xq for all j = 1,..., fc coincides with the position where the level density (37) 


vanishes. Then we have only one saddle point Qo = 9o(3^o)l2fc|2/c- We recall that qo(xo) starts to become real at the 
edge Xq such that it does not have an imaginary part that is related to the metric L. The scale of the local fluctuations 
^ changes to it = xol 2 kl 2 k + J + C/employing the notation of Eq. ([^. This scale reflects the fact that 
the level density vanishes like a square root and that one has to expand around the saddle point up to third order. 
The massive modes 5a around the saddle point solution scales differently, too, in particular we have the expansion 
a = —iqo(xo)'^ 2 k\ 2 k + 5'^/{^pY^^- On the scale of the local fluctuations, we take into account the position of the outlier 
and expand the Lagrangian (|10[) up to order 1/p, 


C xotk 


5 5cr \ 

, -noMk\2k + j 


'{ipy/3 


1 


{ipy/^ 

2 Ah A, 


E 


1 


, 7 - ^0 - 

P ^ 1 + A,qo qo 


str Sa - 


2(/p)2/3 


v2 P 


E 


A? 


P ^ y + Aiqoy 




str Sa'^ 


(59) 




A? 


2 P 

-T 


-3 ) 5a^ + 3(f -I- i(lpy^^eL)5a 


% 


O 


f 1 




The term of order C>(p°) = 0(1) drops out because the saddle point is proportional to the identity matrix such that 
the corresponding supertraces vanish. Moreover the terms of order l/(Zp)^/^ and l/(Zp)^/^ vanish because qo{xo) is 
the contributing saddle point at the position xq where the level density vanishes in a square-root fashion. The first 
term of Eq. (59) is the function g{q{x),x) appearing in the scalar saddle point equation (p^, while the second term 


is its derivative g'{q) with respect to the variable q. We underline that the derivative dqg{q,x) = g'{q) vanishes at go, 
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too, which can be seen as follows. On the one hand, Ri{x) vanishes as such that the Cauchy transform 

of which is up to normalization the saddle point solution qq, has a divergent first derivative at x = xq, i.e., 

q'{x) —t 1/^/\x^^x^\ —>■ oo for a; —>■ xq. On the other hand the total derivative of the function g{qQ{x),x) in the 
variable x yields 0 = dg/dx{qo{x),x) = —1 + g'{qo{x))qQ{x) which indeed has to vanish because go is the saddle point 
solution. Thus we have g'{qo(x)) = l/go(a;) implying that g' vanishes at go(a:o)- 

The 1 Ip term in Eq. (59) is the leading term of the Lagrangian. Thus the fc-point correlation function at the edge 
Xq takes the form 


/c 

Rk{xo,ilpy^^^)d[{lpy/^^]^^^Kn\im^ ^ / d[(5cr] ^ str 5cr 

r_ r. 0 — 1 ' 


Li,...,Lfe=±l i = l 
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e^- 0 

O TYln 




str 


-V- 

P ^ (1 + A*go(a:o))3 gi^(a;o)^ 


6a^ + + i{lp)^^^eL)Sa 


(60) 


We reiterate that Efetov-Wegner boundary terms and, hence, non-vanishing vector fields a la Rothstein m do not 
appear. The coordinate transformation is only a constant shift that cannot cause such contributions. The terms where 
we replace Sa or its higher powers by the leading order saddle point solution ^go(a;o)l 2 fc| 2 fc inside the product of the 
integrand in Eq. (60) vanish for the same reason as in the bulk, see the discussion below Eq. (52). The integral turns 


out invariant under the sub-supergroup UOSp(2|2) such that Cauchy-like Wegner integration theorems [Ml l47lj49] 
apply which reduce the integral over the supermatrix 6a to an integral over a supermatrix of a smaller dimension. 
Then one of the signs Lj drop out and, thus, the remaining integrand is independent of the sign of the imaginary 
increment over which the sum runs. Precisely this sum yields zero due to the additional alternating signs Lj in the 
product. 

The limit e —> 0 together with the sign matrix L and, thus, the original domain of integration of a fixes the 
integration contour for the eigenvalues of the boson-boson and fermion-fermion blocks of 6a. The contour for an 
eigenvalue Sbj- in the boson-boson block consists of two disjoint half-lines and is equal to the union —*R_|_ U LjR^. 
We emphasize that the integration over —zlR_|_ results from the negative sign of qo(xo) implying that the saddle point 
—iqo(xo) lies on the positive half-axis. The integrability of —ilR-|_ is ensured by the cubic term, and the integration 
over LjRj. is absolutely convergent due to the e term. When tilting the second half-line to LjRj. —>■ (Lj/2 + \/32/2)Rj. 
we can perform the £ —>■ 0 limit exactly, because in this more appropriate integration domain the cubic term dominates 
on both half-lines. The integration over an eigenvalue spj in the fermion-fermion block consists of the two half-lines 
g^7TT/6R_^ y independent of L. 

Again we have to unfold the local fluctuations which leads to 
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(61) 


To obtain fc-point correlations, we also have to rescaled the supermatrix 6a and arrive at 
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(62) 


The asymptotic result (62) can be written in terms of the Airy kernel and is thus equivalent to the result for the 
GOE |5D]. We recall that the GOE is well-known to exhibit Airy statistics at the soft edges. We choose the local 
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scaling limit of the corresponding A:-point correlation functions at the edge Xq = 2 and find 
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C{cr) = -strcT^ — istr leL + 
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nl 


(64) 


To arrive at the last equality of Eq. (63) we expanded the supermatrix according to cr = il 2 k\ 2 k + Sa/{nl)^^^ and 
identified the result with the right hand side of Eq. ( [6^ . Of course, the edge correlations of the GOE can be also 
derived by other methods, e.g. orthogonal polynomials. We conclude that the correlated real Wishart ensemble ([^ 
shows, at any soft edge that behave in a square-root fashion, spectral correlations of the Airy type known from the 
GOE. 


C. Distribution of the Largest Eigenvalue 


We consider a particular example to illustrate how useful the independence of the correlations and densities in the 
limit of large matrix dimensions is. In particular, it leads to simpler analytical results. We study the cumulative 
density function for the largest eigenvalue of the correlated Wishart matrix . As we have shown that the soft 

edges as well as the outliers are independent of the generic degeneracy of the empirical correlation matrix C in the 
limit of large matrix dimension, we expect that this also hold approximately for the position and the width of the 
largest-eigenvalue distribution. If outliers are not present and the largest eigenvalue lies at the upper soft edge we 
find the Tracy-Widom distribution |60] . Later on, we will present numerical simulations which confirm that, in first 
approximation, the distribution for the largest eigenvalue of the bulk of the eigenvalues indeed shows the expected 
behavior, see Fig. 

The cumulative density function for the largest eigenvalue of the correlated real Wishart ensemble with a generic 
degeneracy / = 2 in the empirical correlation matrix (7 G I 2 is given by 

A2p,n(i) = j d[W]PiW\C G l2)0(tt2p - ww^) , (65) 


where 0 is the Heaviside function on the symmetric matrices, be., it is unity if the matrix is positive definite and 
zero otherwise. The function if 2 p^„(f) may also be viewed as the gap probability that none of the eigenvalue is larger 
than t G 'R+. Its derivative with respect to t yields the distribution of the largest eigenvalue. In Ref. m we have 
shown that such integrals can be mapped to invariant symmetric matrix ensembles. Then the cumulative density 
function (65) can be rewritten as an integral over a 2n x 2n real symmetric matrix H, namely 
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(66) 


The limit E 2 p^n{t —>■ 00 ) = 1 yields the normalization constant 


1 r exp[tT {iH + l 2 j)]d[H] 27r(“+2)/2 

det(2"+i)/2(*ij + i 2 ^.) " r[(2u-a + I)/2] 


which is a special form of the Ingham-Siegel integral [5^ 155] . 
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Without the degeneracies, the square roots of the determinants det{iH + {ntAj /2 + l)l 2 n) in the integrand ( 661 , 
cf. Ref. |17| . would be most cumbersome. Luckily, the double degeneracy of each empirical eigenvalues combines 
two square roots and yields a determinant to power one. This is a considerable advantage as compared to the non¬ 
degenerated case. Hence, we can algebraically reformulate the integrand such that the integral drastically simplifies. 
To this end, we diagonalize the matrix H = OEO"^ with O S SO(2n) and E = diag (£ii,..., i? 2 n) G 
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( 68 ) 


where the intergration over the orthogonal group |56) leads to the new normalization constant 
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(69) 


Algebraic rearrangement [64] and the usage of skew-orthogonal polynomials 
of the integral (68), 


uncovers the Pfaffian structure 
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with the kernel 
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(71) 


This result is only true for p even. For p odd we may augment the empirical eigenvalues A with a dummy eigenvalue 
Ap+i such that we effectively extend p —>■ p + 1 and eventually take the limit Ap+i —>■ 00 . We refrain from showing 
the details and stick to the case of p even in the sequel. The functions qi (x) in Eq. 0 are the Cauchy transforms 
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of the skew-orthogonal polynomials qi{E) (in monic normalization) according to 
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(73) 


with a,b G Nq. All other bilinear relations between the polynomials vanish. The constants ha follow from the 
normalization constant (69), 
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The Cauchy transform %{x) is readily derived as a Heine-type-of formula 
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(76) 


with an arbitrary constant c; which cannot be fixed with the skew-orthogonality relation but can be used as a gauge 
parameter. The matrix H is a {21 + 2) x {21 + 2) real symmetric matrix. 
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FIG. 2: The two empirical correlation matrices of a 12 x 40 time series (left plot) and a 40 x 100 time series (right plot) which 
were employed for the Monte Carlo simulations. The strength of the correlation is color coded as shown in the legend. 


The integral (75) is very similar to the gap probability Ep=i^n(t = 1) at t = 1, cf., Eq. (66), with the empirical 
correlation matrix C~^ —)■ 2xln, in particular we have 

q^i{x) = a;2(n-0-2 f exp[-a;trITIT'^]det^"”^'”^(l2 - WW'^)Q{l2 - WW'^) (77) 

I7r2" J 

with W a 2 X 2n real matrix. The Cauchy transform g 2 i+i(x) of the odd polynomials can also be expressed in terms 
of such an integral, as it can be traced back to a derivative of q 2 i{x), 


%i+i{x) = -t { X + Cl - 2i{l + 1) + 2{l + l)(n -1) + x-^ ) %i{x). 


dx 


(78) 


Setting Cl = 2i{l + 1) — 2{l + l)(n — 1) we have 
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l + ^)92.(x) 


(79) 
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where FT is a real 2 x 2n matrix. We point out that the imaginary unit in Eq. (7^ cancels with the one in the kernel 0 
such that the result is indeed as required. The integral ([7^ can be evaluated in closed form by diagonalizing the 2x2 
Wishart correlation matrix WW"^ and integrating over the corresponding two eigenvalues. This yields the finite sum 


^ (-l)'h;(6n - 4/- 5)! 1 2 (n i i) , 
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(80) 


The functions lEi and 2 F 1 are the confluent and Gauss’ hypergeometric functions, respectively. The functions 
92 /+i(a;) can be evaluated via relation (79), we omit the details. 

Altogether, we derived the rather simple and fairly explicit results ( [70| ), ( [7l| ), ( [74| ), ( [^ and ( [M[ ) for the cumu¬ 
lative distribution of the largest eigenvalue of WW^ in the presence of degeneracies in C {I = 10, cf. Ref. |17j . 
Without degeneracies, non-trivial analytical problems arise due to the square roots of determinants. Applying now 
our observation that the spectral statistics become the same for large time series, our above results asymptotically 
solve the corresponding problem without degeneracies. Hence, we developed a general method to obtain asymptotic 
results for other quantities of the correlated real Wishart ensemble by artihcially introducing double degeneracies in 
the empirical correlation matrix C. 
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FIG. 3: Level densities as histograms for the real Wishart ensembles with the two empirical correlation matrices shown in 
Fig. Blue lines correspond to the non-degenerated and red lines to the doubly degenerate empirical correlation matrices. 
The level densities around the outliers are shown on a magnified scale in the insets. 


V. NUMERICAL SIMULATIONS 


For illustrating purpose and to show the robustness of our approximations and predictions, we carry out two 
Monte Carlo simulations of the correlated real Wishart ensemble Q. We use a one-factor model, see e.g. Ref. [? 
], to construct two sets of time series ri 2 x 4 o (p = 12 and n = 40) and T 40 X 100 {p = 40 and n = 100). Each set 
T = To + SnoiseTi consist of a signal Tq featuring three perfectly correlated sectors and a fully uncorrelated white- 
noise offset Ti such that {{Ti}ab} = 0 and {{Ti}ab{Ti}a'b') = Saa'^bb'- The strength of the noise is tuned by the 
parameter Snoise- In the simulations we choose Snoise = 3 for Ti 2 x 4 o and as Snoise = 4 for Tjoxioo- From these sets 
of times series we derive the corresponding empirical correlation matrices (712x40 and C^oxioo- They are shown in 
Fig.i The three strongly correlated sectors show up as deep blue blocks on the diagonal although the white noise 
is of the same order as the signal. The sizes of these blocks, (6,3,3) for Ti 2 x 4 o and (20,12,8) for Tioxioo: niainly 
determine the positions of three largest eigenvalues (outliers) of the corresponding empirical correlation matrices, 
A^^'xlo ~ diag (4.44, 2.17, 2.03) for Ti 2 x 4 o and « diag (15.61, 8.39,5.08) for Tjoxioo- However, we see strong 

shifts in Fig. (left) for the smaller time series Ti 2 x 4 o because of the relatively strong noise and the relatively small 
matrix dimensions. 

We numerically simulate the real Wishart ensemble for each of these two so constructed empirical correlation 
matrices (7i2x40 and (74oxioo and their doubly degenerate counter parts (7i2x40 G) I 2 and (74oxioo G) l 2 - Altogether 
we simulate four ensembles. The ensembles consist of 10® matrices for each empirical correlation matrix. These large 
ensemble sizes lead to high statistical significance. In Fig. we present the macroscopic level densities including 
outliers, the statistical errors amount to a few percents at most. The level densities employing the degenerate and 
non-degenerate empirical correlation matrices show perfect agreement in the bulk of the empirical eigenvalues. Not 
surprisingly, the agreement is better for larger dimension p. Nevertheless, even for low matrix dimensions p and n, the 
deviations in the bulk are small. At the edges and for the outliers the deviations become visible beyond the statistical 
error. They result from the statistical fluctuations of the individual eigenvalues around their average positions due 
to the level repulsion caused by the overlapping tails of the individual eigenvalue distributions which are still present 
at finite matrix dimension. In the bulk the eigenvalues are more abundant, implying that their respective positions 
are sharper. In contrast, the eigenvalues near the soft edges explore the region outside the limiting support, while 
they strongly accumulate at the hard edge as the cross-over to the negative real line is forbidden. This behavior is 
suppressed by a generic degeneracy in the empirical correlation matrix. Although the empirical correlation matrix 
might be degenerate, the corresponding Wishart correlation matrix WW'^ is not. Hence there are for the doubly 
degenerate matrix twice as many eigenvalues in as in the non-degenerate case. This implies that the degenerate 

case is closer to the asymptotic result (37) derived by the saddle point solution. In particular, the support becomes 


more restrictive. The same discussion also applies to the outliers whose overlaps with the other eigenvalues are more 
suppressed when the empirical correlation matrix is doubly degenerate. We notice that the level densities around the 
outliers only reaches values of up to two orders smaller than in the bulk. 

Although the level density of the bulk exhibits the strongest differences at its edges, the spectral statistics on the 
local scale converges surprisingly well for the Wishart ensembles with and without the degeneracies in the empirical 
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FIG. 4: The distributions of the smallest top row) and of the largest {Eu{t), bottom row) eigenvalues of the bulk 

normalized to zero mean and variance one. We consider again the same ensemble of Fig. with the 12 x 40 correlation 
matrix (left colnmn) and the 40 x 100 correlation matrix (right column) of Fig. The histograms for t he non-degenerate 
(blue) and the doubly degenerate (red) empirical correlation matrix are also compared to approximations ( |81[ ) for the Tracy- 
Widom distribution (black smooth curve, TW) for real matrices. The agreement with the limiting Tracy-Widom distribution is 
astoundingly good and even the leading order in the deviations from this distribution seem to be independent of the degeneracy. 


correlation matrices. This is seen in Fig. [^which displays the distribution of the largest and smallest eigenvalue at the 
edges of the bulk. For the comparison, the numerical results are unfolded such that the distributions have zero mean 
and unit variance. Moreover, the distributions of the smallest eigenvalue are mirrored at the origin to compare the 
numerical results with the Tracy-Widom distribution |60j for real matrices which should be the limiting distribution 
for large matrix dimensions p and n. The Tracy-Widom distribution indicates that the Airy statistics holds in this 
regime. We employed the approximation 

ETw{t) « 6.68 X 10"^®(t + 8.93)^®-®® exp(-8.93t) , t > -8.93 , (81) 


of the Tracy-Widom distribution |61j . Again, this distribution was normalized to zero mean and unit variance. The 
agreement with the Tracy-Widom distribution is quite good despite the small matrix dimensions p = 12,40 in our 
numerical simulations. The more important result, however, is the good agreement of the two distributions for the 
degenerate and for the non-degenerate empirical correlation matrices. We also mention that even the leading order 
deviations of the numerical simulations from the limiting distribution (81) seem to be approximately independent of 
the degree of the degeneracy I in the empirical correlation matrix. 

The influence of the degeneracy in the empirical correlation matrix is strongest for the level density around the 
outliers, see the insets in Fig. The reason was already discussed at the end of subsection |IV A| The number of 
eigenvalues associated to each outlier is equal to the degeneracy, namely 1. Hence, the shape of the distribution for 
each outlier strongly depends on 1. However the mean value and the standard deviations of the distributions around 
the outliers should not change much with the degeneracy. To leading order we expect an independence which indeed 
is confirmed by the numerical simulations. 

In Fig. the cumulative distribution function cdf(t) is depicted. Being independent of the bin size, it provides a 
better measure than the distribution itself. The agreement with the analytical prediction of the positions (55) and 
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FIG. 5: Cumulative density functions cdf(t) around the three outliers for the real Wishart ensembles with the empirical 
correlation matrices shown in Fig.j^ for the time series Ti 2 xm (left) and r 40 xioo (right). Blue and red histograms for the non¬ 
degenerate and degenerate empirical correlation matrices, respectively. Black vertical lines indicate the predicted positions ( |55[ ) 
of the outliers and the grey shaded areas are the predicted fluctuations (57l. The predicted fluctuations for the smallest outliers 
for the time series Ti 2 x 40 have imaginary values such that they have no grey shaded areas. 


the fluctuations (571 for the three outliers is almost perfect for the set of the larger time series T 40 X 100 and thus 
seen to be independent of the degree of degeneracy. This also holds for the largest outlier in the case for the set 
of the smaller time series Ti 2 x 40 : while the two smaller outliers do not follow at all the analytical predictions. For 
the fluctuations of these two eigenvalues we find imaginary values with Eq. (571, Indicating that the approximation 
discussed in subsection IV A| fails. The reason becomes clear when looking at the Inset of the left plot In Fig. The 
two outliers overlap too much and even start to merge with the bulk. Hence, one has to modify the approximation 
presented In subsection IV A as discussed below Eq. (58). Nonetheless the difference in the cumulative distributions 
of the outliers for the smaller and larger time series differ only marginally for the non-degenerate and degenerate case. 
This underlines our claim that even the outliers are in leading order unaffected by the (artificial) degeneracy. 


VI. CONCLUSIONS 

Our study has produced three main results. The first one Is that the spectral statistics of a real Wishart ensemble 
with a given empirical correlation matrix are independent of an artificially introduced degeneracy of the empirical 
eigenvalues. We derived this under moderate assumptions on the empirical correlation matrix and for an arbitrary 
degree of degeneracy. It holds for the local as well as for the macroscopic bulk statistics. Surprisingly, even the 
positions and the width of the fluctuations of possible outliers are Independent of the degeneracy. The differences 
between the non-degenerate and the degenerate cases are the strongest close to the edges of the bulk and In the 
shape of the distribution around the outliers statistically significant differences between the non-degenerate and the 
degenerate cases emerge. We explained this behaviour theoretically and confirmed it with Monte-Carlo simulations. 

The second main result is that the bulk and soft-edge statistics on the local scale of the mean level spacing follows 
the one of the Gaussian Orthogonal Ensemble (GOE). As we used the supersymmetry technique, we had to handle 
Efetov-Wegner boundary terms. We solved this problem employing Rothstein’s theory and thereby exactly identified 
the statistics in the correlated real Wishart ensemble and in the GOE. Performing numerical simulations, we were able 
to compare the distribution of the largest and the smallest eigenvalue of the bulk with the Tracy-Widom distribution. 
The agreement is remarkably good even for small matrix dimensions. 

Our third main result is a proposition, strongly corroborated by our analyitical findings. As the degeneracies in the 
empirical correlation matrices do not influence the spectral statistics in a relevant fashion, we suggest to study the 
doubly degenerate case of an empirical correlation matrix instead of the non-degenerate one when one wishes to derive 
asymptotic analytical results for observables such as the distributions of individual eigenvalues and the level density. 
Due to the absence of square roots of determinants in the integrands, the doubly degenerate case is by far easier to 
treat than the non-degenerate one. As an example we employed results of Ref. m for the cumulative density function 
of the largest eigenvalue and derived an expression in terms of a Pfaffian in which all integrals are evaluated in closed 
form. We expect that other spectral observable can be asymptotiacally computed as well with this new method. Of 
course, for finite number and length of the time series this approach only yields an approximation, but our numerical 
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simulations indicate that these approximations are quite good even for relatively small matrix dimensions. 
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Appendix A: Rothstein’s Theory for Boundary Terms in Superanalysis 

We consider an arbitrary diffeomorphism mapping one coordinate system (y, rj) of a superspace to another one 
(x,0). Here, we employ the notation of Rothstein |37j, implying that {x,9) and {y,r]) should not be confused with 
variables we use in the body of the paper. The transformation of an integral over an arbitrary superfunction / is 
not purely given by the Berezinian (Jacobian) but also incorporates corrections, henceforth abbreviated “b.t.”, the so 
called Efetov-Wegner boundary terms. 


J f{y,v)d[y,r]] = J fiy{x,r]),r]{x,e))sdet d[x,9] +b.t.. (Al) 

One can control these boundary terms by splitting the diffeomorphism into two steps. First we map the coordinate 
system to the numerical part yo of y and to the first order part (in the Grassmann variables 9) rji of rj. We denote 
the intermediate coordinates (x', 9') such that 

J f{y,v)d[y,v] = J f{yo{x'),r]i{x',9'))sdet d[x',9']. (A2) 

This transformation is free of Efetov-Wegner boundary terms because yo does not contain any Grassmann variables 
9. In the next step we can generate the remaining diffeomorphism by a vector held Y(x,9) via {y{x,r]),ri{x,9)) = 
{yo{x'{x,r])),r]i{x'{x,9),9'{x,9))) = exp[Y{x,9)]{yo{x),rn{x,9)) which yields the full transformation formula 

J f{y,9)d[y,v] = J exp[-Y{x,9)]f{yo{x),r]i{x,9))sdet 6*)^ d[x, 0]. (A3) 

The correctness of this procedure was proven in m Ghapter 3]. 

Two properties are known of the vector held Y. First, it is a nilpotent vector held and a sum of even orders in the 
Grassmann variables. Thus the operator exp[—F] is a hnite sum of powers of Y with the maximal power equal to 
half of the number of Grassmann variables. In our problem it would be 2k'^ and hence independent of the dimensions 
p and n. The second property of the vector held is that it only depends on the coordinate transformation and not on 
the integrand. We make use of this property in our calculation when identifying the A:-point correlation function of 
the correlated Wishart ensemble with the sine kernel for the GOE. 
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